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Abstract. We describe two means by which XSTAR, a code which computes 
physical conditions and emission spectra of photoionized gases, has been paral- 
lelized. The first is pvm_xstar, a wrapper which can be used in place of the serial 
xstar2xspec script to foster concurrent execution of the XSTAR command line 
application on independent sets of parameters. The second is PModel, a plugin 
for the Interactive Spectral Interpretation System (ISIS) which allows arbitrary 
components of a broad range of astrophysical models to be distributed across 
processors during fitting and confidence limits calculations, by scientists with 
little training in parallel programming. Plugging the XSTAR family of analytic 
models into PModel enables multiple ionization states (e.g., of a complex ab- 
sorber/emitter) to be computed simultaneously, alleviating the often prohibitive 
expense of the traditional serial approach. Initial performance results indicate 
that these methods substantially enlarge the problem space to which XSTAR 
may be applied within practical timeframes. 



1. Introduction 

XSTAR is "a computer program for calculating the physical conditions and emis- 
sion spectra of photoionized gases" (Kallman & Bautista 2001); the science it 
facilitates may be described most concisely by paraphrasing the documentation: 
a spherical gas shell surrounding a central source of ionizing radiation absorbs 
some of this radiation and reradiates it in other portions of the spectrum. XS- 
TAR computes the effects on the gas of absorbing this energy, and the spectrum 
of reradiated light, while allowing for consideration of other sources (or sinks) 
of heat, such as mechanical compression & expansion, or cosmic ray scattering. 
Coded in Fortran 77, XSTAR may be used as either a standalone executable or 
in the form of analytic models like warmabs, with the latter being compiled into 
shared objects and dynamically loaded into spectral modeling tools such as ISIS 
(Houck, 2002). We are presently using XSTAR in ISIS to model active galac- 
tic nuclei and non-equilibrium ionization of photoionized plasmas. Relative to 
classic spectral modeling conducted with interactive analysis tools, the scales of 
these efforts are large: analytic models with 20 or more components & roughly 
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Figure 1. The flow diagrams of classic xstar2xspec and its parallelized 
cousin, pvm_xstar, are identical (left): both run xstinitable at outset 
and xstar2table at completion. The only conceptual difference is that in 
pvm_xstar the N jobs are distributed to multiple CPUs via PVM (right), and 
executed in N unique directories to avoid FITS i/o & parameter file clashes. 

300 parameters, scores of which may vary during fitting, or batch XSTAR runs 
on thousands of individual sets of parameters. The compute time required in 
both use cases, on the order of a week to a month for single end-to-end runs, 
precludes traditional use of XSTAR, which is coded for serial execution on one 
CPU. Compounding the problem is the fact that most research efforts require 
multiple end-to-end runs, e.g. to experiment with different model components 
or parameter values, which can extend analysis timeframes into several months. 

2. Batch Execution of XSTAR 

Part of our non-equilibrium ionization modeling includes large-scale simulations, 
wherein the XSTAR application is repeatedly invoked over sets of unique input 
parameter tuples; one spectrum is generated per XSTAR run and saved as a 
FITS file, and these are collated into a single FITS table model that can be 
incorporated into an analytic model for fitting. Historically, this process has 
been driven by the serial xstar2xspec script bundled with XSTAR and outlined 
in Fig. [TJ A representative simulation of 600 XSTAR jobs, generating power 
spectra of Hercules XI, consumed 26.4 hours of wallclock time on a single 2.6Ghz 
AMD Opteron processor with 2GB RAM; a linear scaling to 4200 jobs would 
consume 7.5 days on the same machine. In contrast, a similar physical simulation 
of 4200 XSTAR jobs completed in 110 minutes when executed via pvm_xstar on 
our Beowulf cluster of 52 2.4Ghz Opteron (4GB RAM) processors. As shown 
in Fig. [H pvm_xstar consists of 4 scripts: 2 of these, pvm_xstar proper and 
pvm_xstar_wrap), are coded in Bourne shell, while the master/slave scripts are 
coded in S-Lang using the S-Lang PVM module (Davis et al 2005, Noble et al 
2006) to interface with the Parallel Virtual Machine toolkit (Geist et al 1994). 

3. XSTAR Analytic Modeling 

As noted earlier, XSTAR is also used in the form of dynamically loaded analytic 
models, as in this sequence of commands at the ISIS prompt: 

isis> load_data("my_data.pha") 
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isis> model ("warmabs(l) + warmabs(2) + hotabs(l)") 
isis> set_params ( . . . ) 
isis> fit 

Parameters [Variable] = 48 [21] 
Data bins = 3 
Chi-square = 1.1118061 

The second step defines a 3-component model, consisting of two XSTAR 
warmabs components and one XSTAR hotabs component^ The performance 
bottleneck here is that each component may take 15 or more seconds to evaluate 
just once on a modern CPU, or 45 seconds to compute the entire model expres- 
sion for every iteration of the fit loop initiated by step 4. A typical fit loop may 
contain hundreds of such iterations, with tens of thousands to millions of com- 
ponent evaluations often needed to conduct thorough walks through parameter 
space while generating error bars. In short, days or weeks of compute time can 
be needed for essential analysis when expensive models are involved. 

Latent Parallelism These lengthy runtimes may be shrunk by observing that 
there are two sources of parallelism inherent to model evaluation. First, when- 
ever model components are mathematically independent of one another they 
may be evaluated concurrently. In the above model, for example, each com- 
ponent may be evaluated simultaneously, potentially reducing the runtime of 
each fit loop iteration from 45 to 15 seconds (the theoretical maximum of lin- 
ear speedup on 3 CPUs). This component independence is common in model 
expressions, which are evaluated from left to right under the associativity and 
precedence rules of classic algebra. The second form of parallelism arises from 
bin independence within models: when evaluating the model on the z-th bin — 
model (lo [i] , hi [i] , params) — requires no knowledge of bins i-1 or then 
wavelength/energy grids of size nbins may be trivially decomposed 

lo[l, nbins] = [ lo[l,N], lo[N+l, 2N] ... lo [nbins-N+1 , nbins] ] 
hi[l, nbins] = [ hi[l,N], hi[N+l, 2N] ... hi [nbins-N+1 , nbins] ] 

into nbins/N subgrids and each model (lo_subgrid[j] ,hi_subgrid[j] , params) 
evaluated concurrently. This is relatively common in models of X-ray spectra. 

The PModel plugin for ISIS was written to exploit these latent sources of 
parallelism. Loaded at runtime by a simple require ("pmodel") command, the 
package adds 4 primary functions to ISIS: pm_add() , pm_mult() , pm_func() , 
& pm_subgrid(M) . The first three are stub models, in that they contribute noth- 
ing to the physics being modeled, but can be used in a model expression to 
identify which portions to evaluate concurrently. The fourth function is not a 
stub model, but rather overrides the default model evaluation mechanisms in 
ISIS with routines that decompose the model grid into N independent subgrids. 
In this case the entire model is independently evaluated over pieces of the grid, 
while the first group of functions evaluates pieces of the model independently 
over the entire grid. Using PModel is easy: in the context of our XSTAR example 
only step 2 would need to change, to 

model ( "pm_add (warmabs (1) , warmabs(2), hotabs(l))") 



Note that in warmabs (1) and warmabs (2) the numbers within parentheses are not parameters 
to the model, but rather are tags which uniquely identify instances of a given model type, so 
that each instance may be evaluated with its own set of parameter values. 
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For every iteration of the ISIS fit loop this revised model expression would 
cause the dispatch of each component evaluation to a distinct processor, with the 
results from each combined by a simple additive reduction operation. Although 
PModel may be used to distribute virtually any expensive model components, 
the same ease of use would apply: the parallel use case bears an overwhelming 
resemblance to the serial one, with the differences being simple to identify and 
implement. This means that end- users need not learn to program for parallelism 
in order to use multiple processors in their models, a classic barrier to the adop- 
tion of parallel methods by non-specialists. The PModel functions will decompose 
the model or grid and combine results with either additive, multiplicative, or 
arbitrary functional reduction operations, all transparent to the top-level user 
interface. Moreover, ISIS did not need to be recoded for parallelism, and in fact 
it does not even know the model is computed in parallel; this knowledge is com- 
pletely encoded within PModel, whose functions ISIS simply calls in the same 
serial manner it would for any other physical model component. We have used 
these techniques to reduce the compute time of models with 20+ components, 
containing 10 or more XSTAR components and hundreds of parameters, from 
4+ weeks when run serially to "22 hours on the aforementioned Beowulf cluster. 

4. Conclusion 

Together, pvm_xstar and PModel enable scientists to incorporate multiple pro- 
cessors in their XSTAR modeling without becoming experts in parallelism. 
Amortizing the evaluation of expensive XSTAR components over many CPUs 
allows larger and more physically realistic models to be computed, permitting 
us to probe thousands of physical scenarios in the time it has previously taken 
to compute only a handful of such models. Insofar as analytic modeling of ob- 
servational data is among the most common scientific activities in astronomy, 
PModel has a broad scope of applicability, particularly because it can in principle 
distribute the evaluation of any expensive model, not merely the XSTAR compo- 
nents shown here. Both pvm_xstar and PModel are small open source packages, 
and have been employed at several institutes, on multicore desktops, workstation 
clusters, and high-performance parallel computers. They may be obtained by 
download from http : //space . mi t . edu/cxc/pvm_xstar7| or by contacting the 
lead author. 
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